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Abstract. We consider non-parametric Bayesian estimation of the drift coef- 
ficient of a one-dimensional stochastic differential equation from discrete-time 
observations on the solution of this equation. Under suitable regularity con- 
ditions that are weaker than those previosly suggested in the literature, we 
establish posterior consistency in this context. Furthermore, we show that 
posterior consistency extends to the multidimensional setting as well, which, 
to the best of our knowledge, is a new result in this setting. 



1. Introduction 
Consider the d-dimensional stochastic differential equation 
(1) dX t = b(X t )dt + dW t 

driven by a d-dimensional Brownian motion W, and assume that it has a unique 
(in the sense of the probability law) non-exploding weak solution. One can start 
with a coordinate mapping process X (that is Xt{uj) = ui(t)) on the canonical 
space (C(R+), B(C(K+))) of continuous functions u : M + — > M. d , a flow of sigma- 
fields {J 7 * } and the d-dimensional Wiener measure Q on (C(R + ), £>(C(R+))), and 
then, as is well-known (se e e.g. Proposition 3.6 and Remark 3.7 on p. 303 in 
Karatzas and Shrevei (1988)), under suitable conditions on the drift coefficient b 
and for any fixed initial distribution fi one can obtain a weak solution {X, W) , 
(C(R+), J-, Pt), {J~t} to (HJ) through the Girsanov theorem. The filtration {Tt} can 
be made to satisfy the usual conditions by sui tably augmenting and comp leting the 
filtration {T*}, cf. Remark 3.7 on p. 303 in Karatzas and Shrevei (|l988l ). Hence- 
forth we will assume that we are in this canonical setup. We will also assume that 
X is ergodic with a unique ergodic distribution \ib and is in fact initialised at \ib, 
so that /i = fib- Furthermore, we will abbreviate P^ b to P&. 

Suppose that the drift coefficient b — (pi, ... , bd) belongs to some non-parametric 
class. Denote by bo = (&o,i> • • • > ^o,d) t ne true drift coefficient and assume that 
corresponding to it a sample X , X/\, X 2 a, • ■ • , X n & is given. The goal is to es- 
timate &o non-parametrically. The problem of non-parametric estimation of b 
from discrete-time observations has received considerable attention in the liter- 
ature. For frequent i st ap proac hes to the p roblem see for instance IComte et al 



(120071). iGobet et all (|2004|) a nd IJacodl (l2000h in th e one-dimensional case (d = 1) 
and Dalalyan and Reifi ( 2007 ) and Schmissern 2013 ) in the general multidimensional 
case (d > 1). However, a non-parametric Bayesian approach to estimation of &o is 
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also p ossib le, see e.g. van der Meulen et al. ( 2012 ). van der Meulen and van Zantenl 
( 2013 ) and van Zantenl ( 2012 ). In particular, under appropriate assumptions on the 
drift coefficient b, the weak solution to ([1} will admit transition densities Pb(t, x, y), 
and employing the Markov property, the likelihood corresponding to the observa- 
tions Xja's can be written as 



(2) 



where 717, denotes a density of the distribution fib of X$ (under our conditions 7r& and 
Pb will be strictly positive and finite, see Sections [2] and |3] for details). A Bayesian 
would put a prior II on the class of drift coefficients, say X ', and obtain a posterior 
measure of any measurable set B C X through Baycs' formula 

Ib ^(Xo) Uti Pb(A *<i-i)A, A 4A )n(d&) 



(3) 



Tl(B\X ,...,X nA ) 



f x mxo) nr=i Pb(^ x (l _ 1)A ,x tA )n(db) ■ 



IX "O^UJ lli=l yoy-L, — 1) A ^ 

Here we tacitly assume suitable measurability of the integrands, so that the inte- 
grals in (j3J) are well-defined. In the Bayesian paradigm, posterior encapsulates all 
the information required for inferential purposes. Once posterior is available, one 
can proceed with computation of Bayes point estimates, credible sets and other 

quantities of interest in Bayesian sta tistics. 

It has been argued convincingly in lDiaconis and Freedman (1986) and elsewhere 
that a desirable property of a Bayes procedure is posterior consistency. In our 
context this will mean that for every neighbourhood (in a suitable topology) Ub of 
bo, 

n{U^\X Q ,...,X nA )^0, P bo -a.s. 

as n — > oo (see Sections [2] and |3] for details). That is, roughly speaking, a consistent 
Bayesian procedure asymptotically puts posterior mass equal to one on every fixed 
neighbourhood of the true parameter: the posterior concentrates around the true 
parameter. In an infinite-dimensional setting, such as the one we are dealing with, 
posterior consistency is a subtle property that depends in an es sential way on a 
specification of the prior, see e.g. Diaconis and Freedman! ( 19861 ). Note also that 
the notion of posterior consistency depends on the topology on X. Ideally one would 
like to establish posterior consistency in strong topologies. An implication of pos- 
terior consistency is that even though two Bayesians might start with two different 
priors, the role of the prior in their inferential conclusions will asymptotically, with 
the sample size growing indefinitely, wash out, and the two will eventually agree. 
Furthermore, posterior consistency also implies that the centre (in an appropriate 
sense) of the posterior distribution is a consistent (in the frequentist sense) estima- 
tor of_the_trj^p_arameter. For an introductory treatment of posterior consistency 
see IWassermanl i 



( 19981) . 



In the context of discretely observed scalar diffusion processes given as solu- 
tions to stoc hastic differential equations, pos t erior consistency has been recently 
addressed in van der Meulen and van Zantenl (|2013h . while the case when a con- 
tinuous record of observatio ns from a scalar diffus i on process is avaiable was cov- 
ered u nder various setups in Ivan der Meulen et al.l (|2006l ) , iPanzar and van Zanten 
(|2009h and IPokern et all (|2013h . where m particular the contraction rates of the 
posterior were derived. The techniques used in the latter three papers are of lit- 
tle use in the case of discrete observations. The proof of posterior consistency in 
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van der Meulen and van Zantenl d2013l) is based on the us e of martingale argument s 
in a fashion similar to Tang and GhosaJ ( 2007 ). see also Ghosal and Ta ng (2006). 
The latter paper deals with posterior consistency for estimation of the transition 
density of an ergodic Markov process. The idea of using martingale argum ents in 
the pr oofs of consis t ency o f nonparametric Bayesian procedures goes back to I Walker 
(|2003l ) and I Walkerl (|2004j ) in the i.i.d. setting. On the other han d, a similarity be- 
tween the a rguments used in the proof of posterior consistency in lTang and Ghosal 
(|2OO70 and Ivan der Meulen and van Zantenl (|2013l ) is to a consi derable extent on 
a conc eptual level only: conditions for posterior consistency in iTang and Ghosall 
(2007) involve conditions on transition densities that typically cannot be trans- 
formed into conditions on the drift coefficients, because transition densities asso- 
ciated with stochastic differenti al equations are usually unknow n in ex plicit form. 
Furthermore, in the setting of Ivan der Meulen and van Zantenl (|2013h . who deal 
with ergodic and strictly stationary scalar diffusion processes (in particular, Xq is 
initialised at the ergodic distribution of the process X ), one cannot assume th at 
the density 7Tfc n of Xq is known (as done on p. 1714 in ITang and Ghosall (|2007l )). 
for that would completely determine the unknown drift coefficient bo 



Th e assumption on the class of drift coefficients in Theorem 3.5 of lvan der Meulen and van Zantenl 
( 20131 ) (the latter deals with posterior consistency), namely uniform boundedness 
of the drift coefficients, is quite restrictive in that it excludes even such a prototyp- 
ical example of a stochastic differential equation as the Langevin equation (here we 
assume d = 1) 



(4) 



dX, 



-/3X t dt + adW t , 



where j3 and a are two constants. A solution to (IH) is called an O rnstein-Uhlenbeck 
process, see Example 6.8 on p. 358 in Karatzas and Shreve ( 19881 ) and p. 397 there. 
Hence, there is room for improvement. 

In this work we will show that under suitable conditions posterior consistency in 
the one-dimensional case still holds for the class of unbounded drift coefficients satis- 
fying the linear growth condition. In particular, the case of the Langevin equation is 
covered. In our proof of p oster ior cons i stency we follow the same train of thought as 
initiated in Walker (2003) and I Walkerl (|2004l ). at t he same time making use of ideas 
fromlTang and Ghosall d2007l) and especially from Ivan der Meulen and van Zantenl 
( 2013 ). According to Ivan der Meulen and van Zanten ( 20131) . p. 51. the bounded- 
ness condition on the drift coefficients cannot be avoided in their approach due to 
technical reasons. Our analysis and contribution to the literature, however, shows 
that given a willingness to assume some reasonable and classical conditions on 
the drift coefficients, the case of unbounded drift coefficients can also b e covered 
via techniques similar to those in Ivan der Meulen and van Zantenl (|2013l ) . Perhaps 
more importantly, under some extra, but standard assumptions i n non-parametric 
i nferen ce f or multidimension al stochastic differential equations (cf. lDalalvan and' 
(|2007l ) and lSchmissen (|2013l ) N l. we show that our analysis in the one-dimensional case 
extends to the multidimensional setting as well. To the best of our knowledge, this 
is a new result in this context. 

The rest of the paper is organised as follows: in the next section we state our 
main result in the one-dimensional case, while Section [3] deals with the general 
multidimensional case. In Section |4] we provide a brief discussion on the obtained 
results. The proofs of the results from Sections [2] and [3] are given in Section [5] 
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Finally, Appendices [A] and |B] contain several auxiliary statements used in Section 
[5] together with their proofs. 

2. Posterior consistency: one-dimensional case 

In this section we consider the one-dimensional case (d = 1). The class of drift 
coefficients we will be looking at will be a subset of the class X(K) introduced 
below. 

Definition 1. The family X{K) consists of Borel-measurable drift coefficients b : 
R — > R possessing the following two properties: 

(a) for some constant K > and V& € X{K), the linear growth condition 

\b(x)\<K{l + \x\) 

is satisfied, and 

(b) for each b £ X(K) there exist two constants r^ > and Mb > 0, such that 

b{x) sgn(x) < -r b , V|x|>M b 

holds. 

Remark 1. Analogously to considering L p -spaces instead of £ p -spaces, we will iden- 
tify two functions b\ and 62 in X(K), if b\ = 62 Lebesgue a.e. □ 

Remark 2. The class X(K) is such that the case of the Langevin equation (j4| with 
a = 1 is covered for parameter (3 ranging in the interval (0, K\. □ 

The goal of the following proposition is to show that when b € X(K), a unique 
non-exploding weak solution to (JTJ) exists and has certain desirable properties. Al- 
though, strictly speaking, a weak solution is a triple (A, W), (fi, J 7 , P b M ), {J 7 *}, in 
order to avoid cumbersome formulations, in the sequel we will at times take a liberty 
to call X itself a weak solution. 

Proposition 1. For each b 6 X(K), where X(K) is defined in Definition^ 

(1) a unique non- exploding weak solution to (JT]) exists, 

(2) the weak solution X to ^ is ergodic with unique ergodic distribution fib ad- 
mitting a density < 7Tb(x) < 00, x G R with respect to the Lebesgue measure, 
and 

(3) transition probabilities Pb(t,x,-) are absolutely continuous with respect to the 
Lebesgue measure with densities < Pb(t, x, y) < 00, (t, x, y) € (0, 00) x R X R. 

Remark 3. The contents of Proposition Q] are standard, but perhaps not available 
at one place in the literature. The linear growth condition (a) in Definition [1] is a 
standard assumption to ensure existence of a unique non-exploding weak solution 
to fTJ), see e.g. Proposition 3. 6 and Remark 3.7 on p. 303 , Theorem 5.15 on p. 341 
and Remark 5.19 on p. 342 in lKaratzas and Shrevel(|l988l) . In fact this assumption 



allows one to construct a weak solution via the Girsanov theorem as mentioned in 
the beginning of Section [TJ Property (b) in Definition Q] is a classical assumption 
ensuring exi stence of a unique erg odic distribution for X, see e.g. Assumption (H*) 
on p. 548 in Florens-Zmiroul (|l989f ). Finally, the two properties in Definition [1] also 



yield a short proof of part (3) of Proposition [TJ See Section [5] for more details. □ 
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Remark 4. Apart of its use in the proof of Proposition [IJ the linear growth condi- 
tion (a) on the drift coefficients in Definition Q] is also used e.g. when establishing 
formula (|19p in the proof of Lemma IA.4I in Appendix and more generally in 
those instances where we invoke the Girsanov theorem. Furthermore, property (b) 
from Definition [T] ensures that for 6 € X{K) the density tti, of the ergodic distri- 
bution of X decays exponentially fast at infinity, cf. formula ([9]) . Hence fj, b has 
moments of all orders and for any 61,62 G X(K). the Kullback-Leibler divergence 
K(/ifc 1 ,/ib 2 ) = J m 7Tj )1 ( x) log (7Tb 1 (x)/wb 2 (x))dx is finite. This comes in handy in the 
proof of Lemma IA.1I in Appendix [A] □ 

Remark 5. Positivity of 7T& and pb formally justifies rewriting the likelihood as in 
<j2j) and allows us to employ the likelihood ratio L n (b) in the proof of Theorem 

m □ 

Remark 6. Measurability of the mapping 6 1— ¥ Pb(t, x, y) is a subtle property essen- 
tial in ([3]), but it is difficult to ascertain it in a general setting. Therefore we will 
simply tacitly assume that all the quantities in ([3]) (or in other formulae where we 
integrate with respect to the prior) are suitably measurable. □ 

Since the notion of posterior consistency depends on a topology on the class of 
drift coefficients under consideration, we have to introduce the latter first. We will 
base our topology on the transition operators Pj±. Transition operators associated 
with ([T]) and acting on the class of bounded measurable functions / : K — > E are 
defined by 

P?f(x)= [ Pb (t,x,y)f(y)dy. 
Jr 

We want our topology to separate distinct drift coefficients, which can be thought 
of as an idcntifiability condition. At the same time we want the posterior measure 
to concentrate on arbitrarily small neighbourhoods of the true parameter 60 . Fortu- 
nately, this will be possible with our choice of topology, as it will have the required 
separation property. 

As it often happens in practice, it will be convenient in our case to define a 
topology not by directly specifying the open s ets, but r ather by specifying a subbase 
U (for a notion of a subbase see e.g. p. 37 in iDudlevi (|2002l n. 



Definition 2. Let v be a finite Borel measure on R that assigns strictly positive 
mass to every non-empty open subset o/IR, and let Cbdd(^) denote the class of all 
bounded continuous functions on K. For fixed b £ X(K),f 6 Cbdd(M) and e > 0, 
define 

U b u = {6 e X{K) : ||Pl/ - f$/|| v < e}. 
Here \\ ■ ||i jt/ denotes the Li-norm with respect to the measure v. 

The following definition specifies a topology on X(K). 

Definition 3. The topology T on X(K) is determined by the requirement that the 
family 

U = {U b f e : / e C 6rfd (R),e > 0,6 e X(K)} 



is a subbase for T ■ 
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Remark 7. The topology in Dcfinition[3]clearly depends on the choice of the measure 
v, but since v is assumed to be fixed beforehand and its specific choice is not of great 
importance for subsequent developments, it is not reflected in our notation. □ 

Remark 8. The fact that Definition [3] is a v alid definition follows from a standard 
result in general topology, Theorem 2.2.6 in iDudlevi (|2002l) . □ 



Rema rk 9. The topology in Definition|3]has alre ady been employed in van der Meulen and van Zanten 
(<2013l) . who in that respect follow Section 6 in iTang and Ghosall (|2007|). For a C 2 - 
function / and a small A, 

F*f{x) - P~ b f(x) « A(b(x) -b(x))f(x), 



cf. p. 50 in Ivan der Meulen and van Zantenl (j2013l ). Hence for a small A, the 
topology T in some sense resembles the topology induced by the Li(i/)-norm on 
X(K). □ 

In Lemma [T] given below we will show that the topology of Definition [3] has the 
Hausdorff property. This is perf ectly sufficient for our purposes. For a notion of a 
Hausdorff space see e.g. p. 30 in lDudlevI (l2002h . 



Lemma 1. The topological space (X(K),T) with T as in Definition^ is a Haus- 
dorff space. 

We are ready to give the definition of posterior consistency us ed in the pres ent 
work. For a definition of a neighbourhood used in it, see p. 26 in iDudlev (l2002h . 



Definition 4. Let the prior II be defined on a set X{K) C X(K) and let bo € X(K). 
We say that posterior consistency holds at bo, if for every neighbourhood Ub of bo 
in the relative topology T = {A n X(K) : A 6 T} (with T as in Definition^ we 
have 

U(U§ \X ,...,X nA ) ->0, P bo -a.s. 



We need yet another definition (the definition of a uni formly e quico ntinuous 
family of functions appearing in it can be found on p. 51 in Dudley! ( 2002[ )). 



Definition 5. A family $ of functions f : M — > R is called locally uniformly 
equicontinuous, if for any compact set Fcl, the restrictions /|f of the functions 
f € $ to F form a uniformly equicontinuous family of functions; i.e., for every 
e > 0, there exists a 8 > 0, such that the inequality 



sup sup \f(x) - f(y)\ < e 
fed x, y eF 

\x—y\<5 



holds. 



The following will be a collection of drift coefficients we will be looking at in our 
first main result, Theorem [1] 

Definition 6. Let X(K) be the collection of drift coefficients, such that X(K) C 
X(K) and X(K) is a locally uniformly equicontinuous family of functions. 
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Remark 10. Functions / belonging to some locally uniformly equicontinuous family 
$ of functions are obviously continuous. If the family J is such that for every 
compact set F C R the restrictions f\p of the functions / G 5 to f uniformly 
satisfy a Holder condition (i.e. a Holder condition with the same Holder constants), 
then 5 is a locally uniformly equicontinuous family of functions. □ 

We summarise our assumptions. 

Assumption 1. Assume that 

(a) a unique in law non-exploding weak solution to |T]) corresponding to each b G 
X(K) is initialised at the ergodic distribution ^t, 

(b) bo G X(K) denotes the true drift coefficient, 

(c) a discrete-time sample Xq, . . . ,X nJ \ from the solution to {1} corresponding to 
bo is available (we assume that we are in the canonical setup as in Section^), 
and finally, A is fixed and independent of n. 

The following is our first main result. 

Theorem 1. Let Assumption[J]hold and suppose that the prior II on X(K) is such 
that 

(5) H(beX(K):\\b-b \\ 2 , flbo <e)>0, Ve > 0. 

Here || • || 2,/i &0 denotes the Li-norm with respect to measure fj,b - Then posterior 
consistency as in Definition^ holds. 

Remark 11 . The fact that the members b of the parameter set X(K) must satisfy the 
linear growth assumption for a uniform constant K, as well as the fact that X{K) 
must be a locally uniformly equicontinuous family of functions, is unfortunate, as 
this excludes many interesting and popular priors in non-parametric Baye s ian st a- 
tistics (for instance the Gaussian process priors; cf.i anzar and van Za nten (2009)). 
but can not be avoided with the current method of proof (cf. the remarks on p. 51 and 
p. 60 in Ivan der Meulen and van Zantenl ( 20131 )). In fact, already in the parametric 
setting stronger conditions are used to theo retically justify valid ity o f Bayesian com - 
putational approaches, such as the ones in lBeskos et al.l ( 20061 ) and Eraker ( 2001 ). 
We also remark that in the parametric estimation case from discrete-time observa- 
tions, some domination conditions on the drift coefficients are still imposed in the 
asymptotic studies in the frequentist literature, that do not appear to be easily dis- 
pen sable, except perhaps in simple cases like that of t he Langevin equa t ion ffl ; see 
e.g. Dacunha-Castelle and Florens-Zmiroul ( 1986 ) and Florens-Zmirou |l989). □ 



Remark 12. Condition ([5]) on the prior n is formulated in terms of the L2(pb Q )- 
neighbourhoods, while the posterior consistency assertion returned by Theorem Q] 
is for the weak topology T . However, by Remark H3 for a small A the 'discrepancy' 
is not as dramatic as it may seem at the first sight. □ 

Remark 13. Since bo is unknown, the prior n must verify (|S|) at all parameter 
values b G X. Such priors do exist: since conditions of Theore m [T] are implied by 
conditions in Theorem 3.5 in van der Meulen and van Zantenl (|2013l ) (an assump- 
tion on the drift coefficients b e nsurin g ergodicity of X is not made explicit in 
van der Meulen and van Zantenl (|2013l ). see p. 47 there, but this does not cause 
any proble ms in our setting), two concrete example s of the prior n can be found in 
Section 4 in Ivan der Meulen and van Zantenl (|2013l ). □ 
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Remark 14. The conditions of Theorem [T] cover the case of the Langevin equation 
(|4]) with (7 = 1 known and (3 an unknown parameter of interest. □ 



3. Posterior consistency: multidimensional case 

In this section we turn our attention from the one-dimensional case to the general 
multidimensional case (d > 1). The developments in this section are parallel to 
those in Section [2] and involve some repetitions, so we try to be relatively brief. 

Our parameter set will be a subset of the class X (K 1,^2) of drift coefficients 
introduced below. 

Definition 7. The family X(K\, K2) consists of B or el-measurable drift coefficients 
b : M. d — > M. d possessing the following three properties: 

(a) for any b £ X(Ki, K2), there exists a C 3 -function Vb ■ R d - 



such that 



C h 



e- 2V ^du < 



\Vb(x)\ grows not faster than a polynomial of \\x\\ at infinity and b = — [VVfc] tr , 
where VT4 is the gradient of Vb and tr denotes transposition; 

(b) for any b £ X(K\, K2), there exist three constants rb > 0, Mb > and ab > 1, 
such that 

b(x) ■ x < -r b \\x\\ a \ V||x|| > M b , 

where by dot we denote the usual scalar product on M. d and \\x\\ is the Li-norm 
of a vector x £ R d ; 

(c) there exist two constants K\ > and K2 > 0, such that for any b £ X(K\, K2), 



||6(a:)||<2iCi(l+H 



< K 2 , Vx £ 



i,j = l,...,d. 



Remark 15. Assumptions made in Definition [7] are more than enough to guarantee 
existence of the unique (in t he se nse of the pro bability law) solution to ( U). By 
Proposition 1 in ISchmisser ( 2013 ). cf. p. 27 in Dalalvan and Reifil (<2007l) . these 
assumptions also imply existence of the unique ergodic distribution fib that has the 
density 



7T& 



(a) = ^e- 2V ^ > 
Cb 



with respect to the d-dimensional Lebesgue measure. In models in physics the func- 
tion Vb has the in terpretation o f the potential energy of the system. Furthermore, 
Proposition 1.2 in Gobetl ( 2002[) implies existence of strictly positive transition den- 
sities pb(t,x,y) associated with (TTJ). Finally, for any b, b £ X{K\,K.2) we also have 
that the Kullback-Leibler divergence K(/^,^) is finite, which we use in the proof 
of Lemma IB. 41 in Appendix [Bj □ 

Remark 16. Compared to the case d — 1 in Section^ assumptions made in Def- 
inition [7] on the class of drift coefficients, when specified to the case d = 1, are 
somewhat stronger. □ 

Remark 17. Examples of multidimensional stochastic diffe rential equations satisfy- 
ing assumptions in Definition [7J are given in Section 5.2 in ISchmisser (2013). □ 
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Define the transition operators associated with (JlJ and acting on the class of 
bounded measurable functions / : R d — > R by 

P t b f(x)= f p b (t,x,y)f(y)dy, 

and for every fixed / G C bdd {W l ), e > 0, b G X{K l: K 2 ) let 

U h Le = {& G X{K\, K 2 ) : ||i£/ - P b A f\\i,„ < e}, 

where is a fixed finite Borel measure on R d that assigns strictly positive mass to 
every non-empty open subset of R d . Define the topology T on X(K\,K 2 ) through 
a subbase 

U = {U b f „ : f G C bdd (R d ),e > 0,6 G if 2 )}. 
In complete analogy to Lemma [TJ we have the following result. 

Lemma 2. The topological space (X(Ki, K 2 ), T) with X{K\, K 2 ) as in Assumption 
is a Hausdorff space. 

Let X(Ki,K 2 ) C X (Ki, K 2 ), with the interpretation that X(Ki,K 2 ) is our 
parameter set, and let T = {A n X{K\ , : A € T} be the corresponding relative 
topology on X(Ki, K 2 ). If for any neighbourhood Ub G T of 60 G X(Ki,K 2 ) we 
have 

n(c/ b c o |x ,...,x„ A )^o, F 6 „-a.s. 

as n — > 00, we will say that posterior consistency holds at 60 ■ 
We summarise our assumptions. 

Assumption 2. Assume that 

(a) a unique in law non- exploding weak solution to |T|) corresponding to each b G 
X(Ki, K 2 ) is initialised at the ergodic distribution 

(b) bo € X{K\, K 2 ) denotes the true drift coefficient, 

(c) a discrete-time sample Xq, ■ ■ ■ ,X„a from the solution to {1} corresponding to 
bo is available (we assume that we are in the canonical setup as in Section^), 
and finally, A is fixed and independent of n. 

Under Assumption [2 the following multidimensional analogue of Theorem [T] 
holds. 

Theorem 2. Let Assumption^ hold and suppose that the prior II on X(K\,K 2 ) 
is such that 

(6) uL&X(K u K 2 ): j£>, -MliU j <^J>0, Ve>0. 

Then posterior consistency holds. 

Condition ([6]) on the prior is of the same type as condition ([5]) in Theorem[T] We 
provide an example of a prior II sa tisfying this condition. The cons t ructio n of II 
is similar to that in Example 4.1 in Ivan der Meulen and van Zanten (|2013l) . Both 
examples are relate d to discrete net pr iors in non-parametric Bayesian inference 
problems studied in iGhosal et al.l (|1997l ) . The construction is admittedly artificial, 
but its sole goal is to show existence of a prior satisfying ^ . 

Example 1 . Let # be a collection of C 3 -functions / : R — > R, such that 
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(a) for some polynomial function G : R — > K and V/ G 5" we have 

|/(aO|<G(aO, VxeR+; 

(b) for some constant K i > and V/ € ^ we have 

\f'(x)\<^, VxeR + ; 

(c) V/ e Jwe have 

e^^dx < oo; 



(d) V/ e $ there exist two constants Mf > and r/ > 0, such that f'(x) > 
rf,Vx > Mf, 

(e) for some constant K 2 > and V/ E 

sup {4x\f"(x)\ + 2\f'(x)\} <K 2 . 

For all x eR d set V f {x) = f{\\xf) and b f (x) = -[VV> (x)] tr . Let A?^, AT 2 ) be a 
subset of a collection of all functions bf = (6/,i, . . . , obtained in this way (the 
fact that this is a valid definition, in the sense that the requirements from Definition 
[7] are satisfied, follows by easy, but somewhat tedious computations; note that by 
taking fp = j3x/2 and assuming d = 1 and /3 £ (0,Ki], we can cover the case of 
the Langevin equation ((4])). We get from (b) that for every fixed i = 1, . . . , d, the 
functions bfj are locally bounded by constants uniform in / S 5". Furthermore, 
they are Lipschitz with uniform constants in / e J as well: by the mean value 
theorem, 

Hi( x )- b f,i(y)\ < HV6/,i(Aa: + (l-A)i/)||||a:-tf|| 
< VdK 2 \\x - y\\. 

Hence for each m € N and i = 1, . . . , d, by the Arzela-Ascoli theorem, see Theorem 
2.4.7 in lDudlev ( 20021 ). the collection 25 mj i of restrictions m ,m] of the functions 
bf,i, f & 3 to the intervals [— m, m] is totally bounde d for the supre mum metric || ■ ||oo 
(for the required definitions see p. 45 and p. 52 in iDudlevI (120021) '). Then so is the 
product <2)i = j *8 mj i for the product metric 

||fr/||d,m,oo = max i ||&/,i|[-m,m]l|oo) 
%— l,...,a 

as well as its subset consisting of elements 

bf\[- m , m ]d = (&/,l|[- m , m ], • • • , bf t d\[-m,m])> f € 

Take a sequence e; 1 0. For any I e N, there exists a finite subset 3m,e, = 
{/™' C S™ = 1, • •• ,n mj i] such that for any / e \\bj - b f ™-'i \\d,m,ac < £l for some 
n = 1, . . . , n m j. Let Qi and Q 2 be two measures on N, such that qjj — Qi(j) > 
0,i = l, 2,j e N. The prior n on X(K 1 ,K 2 ) is defined by 

OO OO n>m,l 

^ ^ n m t fn"' 
m— 1 /— 1 n— 1 ' 

where 6b m , H is the Dirac measure at b^n . The fact that IT satisfies requirement 



/n 



<j5j) of Theorem [T] is the content of Lemma [3] in Section [SJ Since IT assigns all its 
mass to a countable subset of X{K\, K 2 ), measurability issues should not concern 
us when integrating with respect to IT. □ 
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4. Discussion 



In this work we were able to demonstrate that posterior consistency for non- 
parametric Bayesian estimation of the drift coefficient of a stochstic differential 
equation holds not only for the class of uniformly bounded drift coefficients and in 
the on e-dimensional setting, as shown previously in lvan der Meulen and van Zantenl 
(|2013| ) , but also in the multidimensional setting for the class of drift coefficients sat- 
isfying a linear growth assumption. This conside rably enlarges the scope of the main 
result in van der Meulen and van Zantenl (|2013h . Interestingly, although derivation 

of the posterior consistency result in t he one-dimensional case is quite involved in 

van der Meulen and van Zantenl (|2013l) (cf. a remark on p. 60 in lvan der Meulen and van Zantenl 
(|2013l ) ) and replacement of the uniform boundedness condition on the drift coeffi- 
cient with the linear growth condition requires some technical prowess in the proofs, 
see in particular the proof of Lemma IA.4l in Appendix ^ generalisation to the mul- 
tidimensional setting does not involve technicalities far different from those in the 
one-dimensional setting, provided one suitably restricts the non-parametric class 
of drift coefficients. In fact, conditions we impose in the multi dimensional setting 
are an alogo us to thos e used in the frequentist literature, see iDalalvan and Reifj 
(|2007h and ISchmisserl (|2013h . which is a comforting fact. On the o ther hand, 
posterior consistency results both in Ivan der Meulen and van Zantenl (|2013l) and 
in our work are established for a weak topology on the class of dr ift coefficients . 
This is a consequence of the fact that we rely on techniques from IWalkerl ( 2004 ) 
in our proofs, which are better suited for proving posterior consistency in weak 
topologies. Consistency in stronger topologies could have been established and 
contraction rates of the posterior could have been derived from general results for 
posterior consistency in Markov chain models had we known existence of certai n 
tests satisfying conditions as in formula (2.2) in iGhosal and van der Vaartl <l2007h : 
see Theorem 5 there. Existen ce of such te sts for Markov chain models has been 
demonstrated in Theorem 3 in iBirgel (|l983h . but unfortunately, the condition s in- 
volved in this theorem, cf. also formula (4.1) in lGhosal and van der Vaartl (|2007T ) . do 
no t appear to hold in general for t h e stoc hastic differential equation models such 



as 



van der Meulen and van Zanten ( 20131 ) and we are considering. Hence estab- 



lishing posterior consistency in a stronger topology and derivation of the posterior 
contraction rate for non-parametric Bay esian drift estimation is an interesting and 
difficult open problem. A recent paper IPokern et ah ( 2013 ) addresses the latter 
question for a one-dimensional stochastic differential equation with a periodic drift 
coefficient. However, this is done under an assumption that an entire sample path 
{X t : t £ [0,T]} is observed over the time interval [0,T] with T — > oo. Moreover, 
periodic drift coefficients are completely different from the drift coefficients con- 
sidered in Se c tion [2] of the present work and making use of the techniques from 
Pokern et al.l d20ia) i s imp ossibl e in our setting. Neith e r are the techniques in 
van der Meulen et al. (|2006l ) andi anzar and van Zantenl (2009) of any significant 
help (these papers deal with continuously observed scalar diffusion processes). It 
should also be noted that in the frequentist setting too (with A fixed) already 
in the one-dimensional setting study of convergence rates of non-parametric esti- 
mators of the drift and dispersion coefficients is a highly non-trivial task, see e.g. 
Gobet et al.l (|2004l) . where various simplifying assumptions have been made, such 
as the requirement that the diffusion process under consideration has a compact 
state space, say [0, 1], and is reflecting at the boundary points. Nevertheless, some 
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progress in establishing posterior consistency in a stronger topology than in the 
present work might be possible in the setting where A = A„ — » in such a way 
that nA n — > oo (the so-called high-frequency data setting). 

Finally, we remark that issues associated with practical implementation of the 
non-parametric Bayesian approach to estimation of a drift coefficient are outside 
the scope of the present work. Although muc h remains to be done in this d irec- 
tion, preliminary studies, su ch as the ones in iPapaspiliqpoulos et al ] (l2012h and 
van der Meulen et al.l (2012J), see also the overview paper Ivan Zantenl (|2012h . indi- 



cate that a non-parametric Bayesian approach in this context is both feasible and 
leads to reasonable results. 



5. Proofs 

Proof of Proposition^ As already mentioned in Remark [31 property (a) in Defini- 
tion [T] suffices to guarantee existence of a unique non-exploding weak solution to 
equation |T]). This proves part (1) of the proposition. 

We next prove part (2). Although the result is well-known, a detailed proof does 
not seem to be available in the literature. We provide it for the reader's convenience. 
Introduce the scale function 

Sb(y) = / exp I —2 / b(x)dx I dz. 



To show existence of an ergodic dist ribution, it i s enou gh to show its existence for 
a process X t = s b (X t ), cf. p. 48 in ISkorokhodl (Il987h . This process satisfies the 
stochastic differential equation 



dX t = a{X t )dW u 



where 



a(y) = s' b ( s ^(y)), 

see Lemma 9 on p. 47 in ISkorokhodl (Il987ft . Here s b 1 denotes an inverse of If 
we can show that 

(7) Sb(-oo) = -oo, s fc (oo) = oo, / * dy < oo, 

Jr ° (y) 



then Theorem 16 on p. 51 in ISkorokhodl (|l987l ) will imply existence of a unique 



ergodic distribution for X, and hence for X too. However, under our assumptions 
checking these conditions is easy. Assuming for instance that the first two conditions 
in (J7J have been verified (the arguments used in their verification are similar to those 
used in verification of the third one), we will check the last one. By a change of the 
integration variable x — s b (y), we have 



(8) 



° 2 {y) JR s 'b( x ) 

= J exp ^2 J b{y)dy^j dx. 
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Finiteness of the latter integral can be seen as follows: let x > M b > and note 
that 



Mb 



exp (2/ hi //)<!//) exp ( 2 / hiy)dy j exp J b(y) d V 



(9) 



< exp (2K (1 + y)dyj exp ^-2r b ^ dy 



= c b . K exp(-r b \x\). 

The argument for negative x is similar and yields a similar inequality. The integral 
in (|8]) is then finite thanks to the exponential decay property as in ([9]) , and existence 
of a unique ergodic distribution follows. By the same Theorem 16 on p. 51 in 
Skorokhodl (|1987| ) , the density of the ergodic distribution of X is given by 



ir(x) 



By the fact that s b is strictly increasing and hence P(X t < x) — P(s b (X t ) < s b (x)), 
it then follows by differentiation that the density of the ergodic distribution of X 
is given by 

7r& ( x ) = — 7isT ex p( 2 / b (y)dy 



m b ( 



where 



m b (dx) 



exp 



is the speed measure of X. Furthermore, by the linear growth condition, < ir b (x) < 
00, Vx € R. This proves part (2). 

Finally, for the proof of part (3) we argue as follows: the first tw o equalities in 
(O and Proposition 5.22 (a) on p. 345 in Karatzas and Shrevei (1988) yield that the 
process X is recurrent. Henc e the solution to ffl generat es a regular diffusion (see 
Definition 45. 2 on p. 272 inlRoeers and Williams! (|l987h ). By Theorem 50.11 on 
pp. 294-295 in lRoeers and Williams! (|l987l ). the transition probabilities of X admit 
continuous, strictly positive and finite densities with respect to the speed measure 
m b (dy) of X. Since from part (2) we have in turn that 



< exp 



^2^ b(z)dz^j < 00, Vz e 



it follows that transition probabilities of X admit continuous, strictly positive and 
finite densities p b (t, x, y) with respect to the Lebesgue measure. This completes the 
proof of the proposition. □ 

Proof of Lemma [7J T he lemma can be proved by arguments s imilar to those in the 
proof of Lemma 3.2 in Ivan der Meulen and van ZantenI (120131 ): cf. also the proof of 
L emma [2] in Append ix IB1 T he following r esult, which is an analogue of Lemma 3.1 



van der Meulen and van ZantenI (|2013l ). is required in the proof (the arguments 



from the proof of the latter remain applicable): let b 6 X(K). Fix t > 0. If b ^ b, 
then P t b ^ Pf. □ 
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Proof of Theorem d The proof follows t he sam e main steps as the proof of Theorem 



3.5 in van der Meulen and van Zantenl d2013l) . which in turn uses some ideas from 



iTang and Ghosall (120071 ) and lWalkerl (|2004f) ~ In particular, our proof employs Lem- 



mas [XT] [52] and [573] frc^^AggendjxELdrat correspond to Lemmas 5.1, 5.2 and 
5.3 in Ivan der Meulen and van Zantenl (|2013l ). Fix e > 0, take a fixed / € Cbdd(^-) 



and write 

(10) B = {b e X{K) : || P b A f - P b A °f\\i, v > e}. 

Without loss of generality we may assume that ||/||oo < 1 and e < 2^(R). We claim 
that by the definition of the topology T it suffices to establish posterior consistency 
for every fixed B of the above form. Indeed, by intersecting the sets from the base 
V determined by the subbase U from Definition [3] with X(K), a base for T can be 
obtained. Likewise, a subbase U for T is obtained by intersecting the sets from the 
subbase U for T with X(K). By definition, an arbitrary neighbourhood of bo 
contains an open set Ub € T. The set Ub is a union of open sets V from the base 
V, Ub — [J{V E V : V C Ub }- There is at least one V that contains 60. Fix such 
V. By definition of the subbase U this set V can be represented as V = Hjli Sj 
for some to, positive numbers Sj, bounded continuous functions fj and sets e . 
from the subbase U. Note that we have 



E4 c c^ c^=|J(^) c - 



Since 

{U b /l e] Y = {be X(K) : \\P b A fj Pi /, Hi,, > Sj} 

c^exw-.WPifj-p^fjWx,^ 

say, the claim becomes obvious. 

The posterior measure of a set B given in (|TU)) can be written as 



U(B\X ,...,X nA ) = 



J B L n (b)U(db) 



J X{K) L n (b)U(dbY 
where 

L (b) — ^(^o) TT Pb(^' X (i-1) a, X iA ) 
7T6o(^o) P6o(^^"(i-l)A)^iA) 

is the likelihood ratio. By Lemma lA.3l in Appendix^! in order to prove the theorem, 
it suffices to show that 

tt(B+\X o ,...,X nA )^0, U{B-\X o ,...,X nA )-^0, Pb -a.s. 

for the sets B^~ and B~ (j = 1,. . . ,N for some suitable integer N > 0) given in 
the statement of that lemma. We give a brief outline of the remaining part of the 
proof: thanks to property ([5]) of the prior, by Lemma IA.ll from Appendix [A] the 
prior n has the Kullback-Leibler property in the sense that (TTT)) holds. Then by 
Lemma I A . 2 1 from Appendix lAl in order to establish posterior consistency, it suffices 
to show that Pb -&.s. the terms 



L„(6)n(d6), J/ L n (b)U(db), 
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converge to zero exponentially fast. T his fact can be proved by the same reas oning 
as given in the proof of Theorem 3.5 in van der Meulen and van Zanten ( 20131 ) (em- 
ploying the con vergence theor em for a positive supermartingale (see e.g. Theorem 
22 on p. 148 in iPollardl (|2002l) ) instead of Doob's martingale convergence theorem 
on pp. 59-60 ther^B)- This completes the proof. □ 

Proof of Lemma [H The lem ma can be proved by arguments simila r to those in 
the proof of Lemma 3.2 in Ivan der Meulen and van Zantenl (|2013l ). The proof 
employs Lemm a IB. II f rom App e ndix IBl that plays the role of Lemma 3.1 from 
van der Meulen and van Zantenl ( 20131 ) in this context. □ 



Proof of Theorem fJl The proof is an easy generalisation of the proof of Theorem 
Q] and uses lemmas from Appendix [B] instead of Lemmas IA.11 IA.2I and IA.3I from 
Appendix O □ 

In the next lemma we verify the claim made at the end of Example [T] 

Lemma 3. The prior II from Example^ satisfies the requirement (jB]). 

Proof. The proof is similar to a demonstration of an analo gous property of the prior 
in Example 4.1 in Ivan der Meulen and van Zantenl (|2013h : for every b E X(K\,K2) 
and positive integer m we have 



d 

E 

i=i 



d 

V 

*=i J IH < 

d 

i=l ^lkll>' 

<d||6-6o| 



(bi(x) - b Q j(x)) 2 ir bo (x)dx 
(bi(x) - b . i (x)) 2 ir bo {x)dx 



rn.d.oo 



4K 2 d 



(1 + ||x||) 2 7rf, (a:)da;. 



' ||a;|| >m 

Thanks to the fact that /Xb has an exponential moment, the second term on the 
right-hand side can be made less than e 2 by choosing m large enough. Hence 



n \b&X(K u K 2 ) 



d 

E 

;=i 



>n [b£X(K u K 2 ) : H&-&0II 



m,d, oo 



< 



For I such that e; < e/Vd, we have by construction of LT that the right-hand side 
of the above display is bounded from below by q m ,i<li,2/k m ,i > 0. This completes 
the proof of the lemma. □ 



Appendix A. 



Th e following result is a restatement of Lemma 5.1 in lvan der Meulen and van Zanten 
(|2013l) . 



^Note that on p. 58 in lvan der Meulen and van Zantenl j2013l) the expression L n is called the 
likelihood, although obviously the likelihood ratio is meant. 
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Lemma A.l. Let 



KL(b ,b) = I I TTb (x)pb (A, x, y) log 
Jr Jr 



Pfe (A,x,t/) 
p b (A,x,y) 

Then for the prior II satisfying property ([5]) , the inequality 
(11) U{b£X(K):KL(b o ,b)<e)>0, Ve > 

holds. 



dxdy. 



Proof. The same proof as in Ivan der Meulen and van Zantenl (|2013) goes through. 
The only additional clarification we would like to make concerns finiteness of 
K.L(bp,b). The latter follows from the inequality in the proof of Lemma 5.1 in 
van der Meulen and van Zantenl (j2013l ). 



KL(6 ,6) < -K(^ bo ^ b )+K(C 1 ,C 2 ), 



where 

(12) K(A,£ 2 ) 



E 



log 



(b(X s 



b (X s ))dW s + - 



(b(X s )~b Q (X s )) 2 ds 
A 



Yll 6 ~ 5 o|||,^ o 



is the Kullback-Leibler divergence between the laws C\ = £({X t ,t £ [0, A]}|Pb ) 
and £ 2 = C(X t ,t £ [0, A]\P b ) of the full path {X u t £ [0, A]} under P bo and P b , 
respectively, while K(fj, bo ,^i b ) is the Kullback-Leibler divergence between the two 
invariant measures fj, bo and fi b . The second term on the right-hand side of the last 
equality in (TT21 is finite by the exponential decay property of ir bo , cf. formula © . 
Furthermore, we have 



ir bo {x) log —dx 



< 



log 



m b (R) 



m 6o (R) 



TT b (x) 



— ] 7r fco (x)da:. 



This implies finiteness of the first term on the right-hand side of the last equality 
in (TT2|) . and hence of KL(&o, b) too. □ 



Th e next lemma is a restatement of Lemm a 5.2 inlvan der Meulen and van Zanten 
(|2013l ). cf. also the proof of formula (2.1) in lTang and Ghosall (|2007l ). 

Lemma A. 2. Suppose that a prior II has property (|11[) . If for a sequence C n of 
measurable subsets of X(K) there exists a constant c > 0, such that 



then 
(13) 



L n (b)n{db) -> 0, P 6o -a.s. 



n(C n |X o ,...,X An )^0, A -a.s. 



as n ^ oo. 
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Proof. The proof of Lemma 5.2 in Ivan der Meulen and van Zantenl (|2013h remains 
applicable. The requ ired version of the strong law o f large numbers for ergodic 
sequences invoked in Ivan der Meulen and van Zantenl (|2013f) follows for instance 
from Theorems 3.5.8 and 3.5.7 in lStoutl ([19741 ). □ 



The next lemma is a restatement of Lemma 5.3 in van der Meulen and van Zantcn 
( 2013 ). the proof of which employs some arguments from Tang and Ghosall ( 20071) . 



Lemma A. 3. Fix e > such that e < 2v( 
ll/lloo < 1, and write 



), take a fixed f G CMd(^) such that 



B = {be X{K) : \\Plf - P b A °f\\i, v > £}. 



Then there exist a compact set F C 
I\ , . . . , In covering F, such that 



an integer N > and bounded intervals 



N 



N 



l>; U (J*; 



whe 



B+ = ibeB:P A f(x)-P A °f(x)> 



Av{F) 



BJ = \beB:P A f(x)-P b A °f(x)< 



Av{F) 



Proof. The proof of this lemma require s Lem ma IA.4I that corresponds to Lemma 
A.l in van der Meulen and van Zantenl (120131) . The arguments from the proof of 
Lemma 5.3 in Ivan der Meulen and van Zantenl (|2013l ) carry over. □ 



The next lemma is an adaptation of Lemma A.l in lvan der Meulen and van Zanten 
12013) , but in its proof we need somew hat different arguments than those used in 
van der Meulen and van Zantenl (j2013l ). 



Lemma A. 4. For every fixed f £ Cbdd(^), the family {P A f : b e X{K )} is locally 
uniformly equicontinuous. 

Proof. By definition we need to show that the family {P A f : b E X(K)} is uniformly 
equicontinuous whenever the argument x of P A f(x) is restricted to an arbitrary 
compact set F. Since the transition operators form a semigroup, it is enough to 
prove the latter claim for A small enough, in particular for A satisfying 



(14) 

In fact, we have 



KA < -. 

2 



\Pa 



/2 f(x)-P b A/2 f(y)\, 



PlM\ < Pi 

and if {P A / 2 f '■ b G X(K)} is uniformly equicontinuous when the argument x ranges 



A/21- 1 A/2 

ti 

in F, then it is immediately seen that so is {P A f ■ b G X(K)}, while if not, then 
we can reiterate the same argument, but now with A/2 and A/4 instead of A and 
A/2 and so on, until (|14|) is met. 
Fix a compact set F and let 

,A -, ,A 



b(u + W s )dW s 



b 2 (u + W s )ds, L v 
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for a standard Brownian motion W. Then as in van der Meulen and van Zantenl 
(|2013j ). it can be shown that 

Pif(x)=E[f(x + W A )L x ], 

where the expec tation is evaluated under the Wi e ner m easure (the Girsanov the- 
orem invoked in van der Meulen and van Zantenl ( 2013 ) is applic able in our case 

thank s to the linear growth condition and Corollary 5.16 on p. 200 in lKaratzas and Shreve 
(|l988l n. Also 

\P b A f(x) - P b A f(y)\ < E [\f(x + W A )\\L X - L v \] + E [L y \f(x + W A ) - f(y + W A )\] 
:= Si + S 2 , 

where x,y € F. We will bound the two terms Si and S 2 separately. 
By (fH| there exists q>l, such that 

(15) KA < — ^. 

Fix such q and let q be determined as that root of the equation 

(16) ^(^-f) 

that is larger than 1. Next set r = q/(q — l). Note that r > 1 and that 1/r + l/q = 1. 

To bound Si, we apply an elementary inequality \e a — e b \ < \a — 6||e a + e b \ for 
a, b 6 K and Holder's inequality with exponents r and q defined as above to obtain 

Si < ll/llooEd^-L^I] 

< ll/llooEO^-^H^+iyl] 

< ll/lloolE^-^lir/'-lEH^ + L,^]} 1 /?. 

In order to bound Si, we hence need to bound the last two factors on the right-hand 
side of the last inequality in the above display. We first treat the first of these two. 
Note that 
(17) 



ly — 



A 



~S, 



(b(x + W s ) - b{y + W s ))dW s 
+ S A . 



A 



(b 2 (x + W s )-b 2 (y + W s ))ds 



The c r -inequality yields that in order to bound {£[1^ — l y \ r ]} ' r , it is enough 
to bound EflSal 1 "] and Efl^] 1 "]. We bound the first of these two expectations as 
follows: by the Burkholder- Davis-Gundy inequality, see Theorem 3.28 on p. 166 in 



Karatzas and Shrevd (|l988h . 



E 



{b(x + W s )-b(y + W s ))dW s 



< C r 



(b(x + W s )-b(y + W s )) 2 ds 



r/2' 



where C r > is a universal constant independent of b, x and y. For a constant R > 
and the set F' = {u + v : u € F, v e [— R, R]} by the Cauchy-Schwarz inequality the 
expectation on the right-hand side of the above display can be handled as follows: 
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/ (b(x + W s ) - b(y + W 8 )) 2 ds 
Jo 



r/2 



1 [sup s<A |iy s |<fl] 



+ E 



(b(x + W s )-b(y + W s )) 2 ds 



r/2 



1 [sup £ , <A \W,\>R] 



E 



< A r / 2 sup \b(u) - b(v 

u,veF' 

\u-v \<\x-y\ 

1/2 



/ (b(x + W s ) -b(y + W s )) 2 ds 
Jo 



P( sup\W s \ >R 



1/2 



Thanks to the fact that X(K) is a locally uniformly equicontinuous family of func- 
tions, for a fixed R the first term on the right-hand side of the last inequality can 
be made arbitrarily small uniformly in b G X(K) by choosing 5 small enough and 
\x — y\ < 5. Also P (sup s<A \ W S \ > R) can be made arbitrarily small by choosing 
R large enough. Finally, 



E 



/ (b(x + W s )-b(y + W s )) 2 ds 
Jo 

< A r/q E 



/ \b(x + W s )-b(y + W s )\ 2r ds 
Jo 



The expectation on the right-hand side is bounded by a universal constant inde- 
pendent of particular x,y € F and b G X{K). This can be seen by a simple, but 
lengthy computation employing the Fubini theorem, the linear growth condition on 
b, the C2r-inequality and the fact that W has moments of all orders. This com- 
pletes bounding E [|5 f 3| r ], which hence can be made arbitrarily small uniformly in 
b e X(K), once \x — y\ < S for 8 > small enough. The term E[|S , 4| r ] can be 
bounded using similar computations by employing the inequality 

\b 2 (u) - b 2 {v)\ < K{2 + \u\ + \v\)\b(u) - b(v)\ 

and the Cauchy-Schwarz inequality (twice), yielding together 



E 



/ (b 2 (x + W s )-b 2 {y + W s ))ds 
Jo 

,A 

/ (b(x + W s )-b(y + W s )) 2 ds 
Jo 

,A 

/ (b{x + W s )+b(y + W 8 )) 2 ds 
Jo 



< i E 



x <^ E 



The first factor on the right-hand side can be made arbitrarily small uniformly in 
b E X(K) by taking <5 small (cf. above), while the second factor remains bounded 
uniformly in b e X(K) and x,y, G F. To complete bounding Si, we need to bound 
the right-hand side of the inequality 

E[\L x + L y n< Cq E[Ll}+c q E[Ll}. 
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Since obviously both terms on the right-hand side can be bounded in exactly the 
same manner, we will only give an argument for one of them. By the Cauchy- 
Schwarz inequality applied to two random variables 

b 2 {x + W s )ds ) , 



exp 



exp 



qb(x + W s )dW s 



A 



we have 



(18) 



E[L|]< 



exp 



(V-f) / b 2 (x + W s )ds 



q 2 b 2 {x + W s )ds 



Here we used the fact that 
(19) E exp ( / 2qb{x 



W s )dW s - - 



A 



Aq z b z {x + W s )ds 




= 1, 



since the process under the expectation sign is a martingale and has expectation 
eq ual to one (this is due to the linear growth condition and Corollary 5.16 on p. 200 



m 



Karatzas and Shrevei ( 19881 )). Hence it remains to bound the right-hand side of 



181) . which we denote by S5. By the linear growth condition we have 

A 



Sg < exp (2giv 2 A(l + \x\f) E 



exp 2qK< 



W 2 ds 



11 



Showing finiteness of the expectation on the right-hand side is standard: by Doob's 

maxim al inequality for submartingales (see Theorem 3.8 (iv) pp. 13-14 in Karatzas and Shrevei 
(|19881 ): that the exponential on the right-hand side of the first line of the dis- 
played formula below is a submartingale follows from Problem 3.7 on p. 13 in 
Karatzas and Shrevei (|l988l )1. 



E 



exp 2qK 2 / W 2 ds 



< 



sup exp (2qK 2 AW 2 ) 



s<A 



< 4E [exp (2qK 2 AWl)] < 00. 

Here in the last inequality we used (IT31) . 

A conclusion that follows from the above bounds is that Si can be made arbi- 
trarily small as soon as \x — y\ < 5 for small enough S. The bound on Si will be 
true uniformly in b € X{K). 

In order to bound S2, we again use Holder's inequality to get 

S 2 < {E [£«]}V«{E [\f{x + W A ) - f(y + ^ A )n} 1/r . 

The first factor on the right-hand side can be bounded as above. The second factor 
can be made arbitrarily small as soon as \x — y\ < 5 for small enough 5. Indeed, for 
a constant R > write 

E [\f(x + W A ) - f(y + W A )\ r ] = E [\f(x + W A ) - f(y + W A )\ r l [lWAl>R] ] 

+ E [\f(x + Wa) - f(y + W A )\ r l[\wM<B]} 
<(2\\f\\ 00 ) r P(\W A \>R) 
+ E[\f(x + W A )-f(y + W A )\ r l 
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It is obvious that the first term on the right-hand side of the last inequality can be 
made arbitrarily small by selecting R large enough. However, so can be the second 
one upon fixing R by taking \x — y\ < S for small enough 6 > 0, since the function 
/ is uniformly continuous on compacts. This completes the proof. □ 



Appendix B. 

Lemma B.l. Let b,b G X (K u K 2 ). Fix t > 0. Ifb^b, then P t b ^ pf. 

Proof. The proof is similar to the proof of Lemma 3 . 1 in lvan der Meulen and van Zanten 
( 20131 ). By continuity of b and b we have that if b ^ b, this in fact holds on a set 
of positive Lebesgue measure. Then also Vb Vr on a set of positive Lebesgue 
measure and therefore nj, ^ tt-^ on a set of positive Lebesgue measure, for instance 

some open ball in R d . Now assume that P t fc = P t b . Then for any bounded measurable 
function / and any positive integer to, by the semigroup property of P t h we have 
that 

E b x [f(X mt )] = (P*) m f(x) = (Pf) m f(x) = K[f{X m t)\. 
Letting m — > oo, the above display and ergodicity give that 

f(y)nb(y)dy = / f{y)TT Z {y)dy. 

Hence Wb = tt-^ a.e., and in fact by continuity 7Tj, = 7r^ everywhere. This is a 
contradiction and thus b^b implies P t h ^ P t b . □ 

Lemma B.2. Fix e > such that e < 2i/(M), take a fixed f G CbddQ&. d ) such that 
ll/lloo < 1, cmd write 

B = {be X ; ||Pl/-Pi°/||i,, >e}. 

Then there exist a compact set F C Mr, an integer N > and cubes ii,..., Jjv 
covering F, such that 



N \ f N 

n 



<3=1 / \J=1 



where 



B+ = 1 6 e B : Pi/(:r) - p£°/(z) > , Va; G 7, 



Proof. The proof of Lemma 5.3 in Ivan der Meulen and van Zantenl (|2013l ) car- 
ries over, provided one redefines the intervals Ij of length 6/2 > from that 
proof to be cubes with sides of l ength 6/2, and uses instead of Lemma A.l from 
van der Meulen and van Zanten (2013) Lemma IB. 31 below. □ 



Lemma B.3. For a fixed f e C bdd (R d ) andt > 0, the family {P b f : b e X{K 1 ,K 2 )} 
is a locally uniformly equicontinuous family of functions. 



22 



SHOTA GUGUSHVILI AND PETER SPREIJ 



Proof. Let 

d «A i d „A 

l u=Y J b i (u + W s )dW i , s --Y] / bj{u + W s )As, L u = e h ' 

for a standard <i-dimensional Brownian motion W = (W \, . . . , W^). Then, employ- 

i ng th e Girsanov theorem, as in the proof of Lemma A.l in lvan der Meulen and van Zanten 
(|2013l ). see also the proof of Lemma [A. 4| it can be shown that 

P b A f(x)=E[f(x + W A )L x ], 

where the expectation is evaluated under the Wiener measure. From this point 
on the proof is a generalisation of the arguments in the proof of Lemma IA.4I from 
Appendix [A] to the multidimensional setting. In particular, as in that proof, it is 
enough to prove the lemma for A such that 

AK, < -L. 

In order to prove the lemma, we need to show that the family of functions {P b f : 
b G X(Ki, K2)} is uniformly equicontinuous whenever the argument x of P b f(x) 
is restricted to an arbitray compact set F. Fix a compact set F cM. d . Throughout 
this proof we assume x, y € F. We have 

\P b A f(x) - P b A f(y)\ < E [\f(x + W A )\\L X - L y \\ + E [L y \f(x + W A ) - f(y + W A )\] 
:= Si + 02. 

We will bound the two terms Si and S2 separately. There exists q > 1, such that 

1 



(20) 



KiA < 



Fix such q and let q be determined as that root of the equation 



(21) 



(«■-§) 



that is larger than 1. Next set r = q/(q — 1). We first bound Si. As in the proof of 
Lemma [A. 41 from Appendix [Al we have that 

Si < ||/||oo{E [\l x - l y \ r ]} 1/r {E [\L X + L y \*\y/*. 

The c r -inequality gives that in order to bound {E[|^ — £j,| r ]} 1/,r , it is enough to 
bound the terms 



E 



E 



(b i (x + W s )-b i (y + W s ))dW l 

A 

(b 2 l (x + w s )~b 2 l (y + W s ))di 



for i = 1, . . . , d. Since the arguments are the same for any i, we henceforth fix a 
particular i. As in Lemma lA. 41 the Burkholder-Davis-Gundy inequality gives 



E 



{bi{x + W S ) -bi{y + W s ))dW it 
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< C r E 



(h(x + W s )-b l (y + W s )) 2 ds 



r/2' 



where C r > is a universal constant independent of b. For a constant R > and 
the set F' = {u + v : u € -F, ||u|| < R} by the Cauchy-Schwarz inequality the 
expectation on the right-hand side of the above display can be bounded as follows: 



E 



A 



(b l (x + W s )-b l (y + W s )) 2 ds 



r/2 



E 



[su Ps < A \\W S \\<B] 
r/2 



(b i (x + W s )-b i (y + W s )fds 



1 



[su Ps<A ||W s ||>fl] 



E 



<A r / 2 sup \bi(u)-bi(v)\ 

u,v£F' 
||«— u||<||x— y|| 

{bi{x + W s )-bi(y + W s )) 2 ds 



P[ SUP || W a || >i? 
<A 



1/2 



Since 6 has partial derivatives bounded in absolute value by K2, the first term on 
the right-hand side of the above display can be made arbitrarily small by choosing 
S small enough and ||x — y\\ < 5. Furthermore, the term 

P^sup||VF s || >R\^ 1 

can be made arbitrarily small by choosing R large enough. A lengthy, but easy 
computation shows that the term 

n ^ V2 



E 



{b l {x + W s )-b l {y + W s )) 2 ds 



is bounded by a constant independent of 6; cf. the arguments in the proof of Lemma 
IA.4I from Appendix [X] Consequently, the term 



E 



{b i (x + W s )-b i (y + W s ))dW il 







can be made arbitrarily small, once S is chosen small enough and ||x — y\\ < S. The 
term 

fA 

(.2/ 1 w \ .2/ 



E 



{bi{x + W s )-bi(y + W s ))ds 







can be shown to be bounded uniformly in b G X(K\,K-i) by employing similar 
techniques; cf. the proof of Lemma lA.41 from Appendix lAl Next we need to bound 
the right-hand side of the inequality 

E [\L X + L y \ q ] < c q E [L q x ] + c q E [L q y ]. 

Since obviously both terms on the right-hand side can be bounded in exactly the 
same manner, we will only give an argument for the first one of them. By the 
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Cauchy-Schwarz inequality applied to the random variables 

r A \ 

1,2/ 



exp 



exp [Y, f 



d 

E 

i=l 



bf(x + W s )ds 



o 



qh(x + W s )dWi, s 



d 

E 

i=l 



q 2 b 2 {x + W s )ds 



as in the proof of Lemma IA.4I from Appendix [X] we have 

r-A 



(22) 



E[L%] <{E 



exp 2^-1 



d 

E 



&?(» + W s )ds 



1/2 



Hence it remains to bound the right-hand side of the above display, which we denote 
by 55. By the linear growth condition we have 



S 2 < exp (2dqK 2 A(l + \\x\\) 2 ) E 



expUdqK 2 ^ ||W s || 2 ds 



By Doob's maximal inequality for submartingales and independence of scalar Brow- 
nian motions W, 's, 



E 



exp 2dqKf 



\W.\rda 



< 



4j^E [exp (2d§X?AW£ A )] < 00. 



Here in the last inequality we used (|20|) . The conclusion is that the term Si can 
be made arbitrarily small by taking S small and \\x — y\\ < S. The proof is now 
completed as in the case of Lemma TA.4I from Appendix lAl by Holder's inequality 

S 2 < {E [Ll]}^{E [\f(x + W A ) - f(y + ^A)| r ]} 1/r . 

The first factor on the right-hand side can be bounded as above uniformly in 6 £ 
X(K\, K2). The second factor can be made arbitrarily small as soon as \\x — y\\ < 5 
for small enough 5: for a constant R > 0, 

E [\f(x + W A ) - f(y + W A )\ r ] = E [\f(x + W A ) - f(y + W A )\ r l mA \ l>m ] 

+ E [\f(x + W A ) - f(V + WA)ri[||»r A ||<H|] 

<(2||/|| 00 )'"P(||Wa||>JI) 

+ E [1/(1 + Wa) - f(y + W A )\ r l [llWAlim }. 

The first term on the right-hand side of the last inequality can be made arbitrarily 
small by selecting R large enough. Upon fixing R, so can be the second one by taking 
|| £ — y\\ < 5 for small enough S > 0. Combination of all the above intermediate 
results entails the statement of the lemma. □ 



Lemma B.4. Let 



KL(b ,b)= / 7T(, (x)pb (A, x, y) log 



Pb (&,x,y) 



dxdy. 



p b (A,x,y) 

and assume that the weak solution to ([T]) is initialised at /if,. Then for the prior H 
satisfying property the inequality 

(23) n(6 G X(Ki,K 2 ) : KL(6 , b) < e) > 0, Ve > 

holds. 
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Proof. The proof is an obvious mod ification of the proof of Lemma 5.1 in in 



van der Meulen and van Zantenl ( 20131 ): as in the proof of Lemma I A. II in Appendix 



lAl we need to verify additionally that the Kullback-Leibler divergence K(/i;,,/ir) 
is finit e for a ny b, b E X{K]_,K<i). This, however, follows from Proposition 1.1 in 
Gobeti (l2002t ). □ 



Lemma B.5. Suppose that the prior II on X(K\,K-i) has the property ()23[) and 
assume that the weak solution to ([T]) is initialised at /i/,. If for a sequence C n of 
measurable subsets of X(K\, K2) there exists a constant c > 0, such that 



then 



e nc [ L n (b)U(db) -> 0, P bo -a.s., 
U{C n \X ,...,X An )^0, P bo -a.s. 



as n — > 00. 



Proof. The proof is an easy generalisation of the proof of Lemma 5.2 in van der Meulen and van Zanten 
f)2013h . □ 
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